Search CORE

199 research outputs found

Improving polyphonic and poly-instrumental music to score alignment

Author: Rodet Xavier
Schwarz Diemo
Soulez Ferréol
Publication venue: HAL CCSD
Publication date: 01/10/2003
Field of study

6ppInternational audienceMusic alignment links events in a score and points on the audio performance time axis. All the parts of a recording can be thus indexed according to score information. The automatic alignment presented in this paper is based on a dynamic time warping method. Local distances are computed using the signal's spectral features through an attack plus sustain note modeling. The method is applied to mixtures of harmonic sustained instruments, excluding percussion for the moment. Good alignment has been obtained for polyphony of up to five instruments. The method is robust for difficulties such as trills, vibratos and fast sequences. It provides an accurate indicator giving position of score interpretation errors and extra or forgotten notes. Implementation optimizations allow aligning long sound files in a relatively short time. Evaluation results have been obtained on piano jazz recordings

HAL-UJM

JScholarship

Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations

Author: Lacheret Anne
Obin Nicolas
Rodet Xavier
Publication venue: HAL CCSD
Publication date: 28/08/2011
Field of study

International audienceIn this paper, a unified trajectory model based on the stylization and the modelling of f0 variations simultaneously over various temporal domains is proposed. The syllable is used as the minimal temporal domain for the description of speech prosody, and short-term and long-term f0 variations are stylized and modelled simultaneously over various temporal domains. During the training, a context-dependent model is estimated according to the joint stylized f0 contours over the syllable and a set of long-term temporal domains. During the synthesis, f0 variations are determined using the long-term variations as trajectory constraints. In a subjective evaluation in speech synthesis, the stylization and trajectory modelling of short and long term speech prosody variations is shown to consistently model speech prosody and to outperform the conventional short-term modelling

A Multi-Level Context-Dependent Prosodic Model applied to duration modeling

Author: Lacheret-Dujour Anne
Obin Nicolas
Rodet Xavier
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceon the estimation of prosodic parameters on a set of well defined linguistic units. Different linguistic units are used to represent different scales of prosodic variations (local and global forms) and thus to estimate the linguistic factors that can explain the variations of prosodic parameters independently on each level. This model is applied to the modeling of syllablebased durational parameters on two read speech corpora - laboratory and acted speech. Compared to a syllable-based baseline model, the proposed approach improves performance in terms of the temporal organization of the predicted durations (correlation score) and reduces model's complexity, when showing comparable performance in terms of relative prediction error. Index Terms : speech synthesis, prosody, multi-level model, context-dependent model

CiteSeerX

R\'enyi Information Measures for Spectral Change Detection

Author: Liuni Marco
Rodet Xavier
Romito Marco
Röbel Axel
Publication venue
Publication date: 01/01/2011
Field of study

Change detection within an audio stream is an important task in several domains, such as classification and segmentation of a sound or of a music piece, as well as indexing of broadcast news or surveillance applications. In this paper we propose two novel methods for spectral change detection without any assumption about the input sound: they are both based on the evaluation of information measures applied to a time- frequency representation of the signal, and in particular to the spectrogram. The class of measures we consider, the R\'enyi entropies, are obtained by extending the Shannon entropy definition: a biasing of the spectrogram coefficients is realized through the dependence of such measures on a parameter, which allows refined results compared to those obtained with standard divergences. These methods provide a low computational cost and are well-suited as a support for higher level analysis, segmentation and classification algorithms.Comment: 2011 IEEE Conference on Acoustics, Speech and Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Archivio della Ricerca - Università di Pisa

Prosodic control of unit-selection speech synthesis: A probabilistic approach

Author: Rodet Xavier
Veaux Christophe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Edinburgh Research Explorer

The Importance of Cross Database Evaluation in Sound Classification

Author: Arie Livshin
Xavier Rodet
Publication venue: Johns Hopkins University
Publication date: 01/01/2003
Field of study

In numerous articles (Martin and Kim, 1998; Fraser and Fujinaga, 1999; and many others) sound classification algorithms are evaluated using "self classification" - the learning and test groups are randomly selected out of the same sound database. We will show that "self classification" is not necessarily a good statistic for the ability of a classification algorithm to learn, generalize or classify well. We introduce the alternative "Minus-1 DB" evaluation method and demonstrate that it does not have the shortcomings of "self classification"

CiteSeerX

JScholarship

Analysis of Sound Signals with High Resolution Matching Pursuit

Author: Bacry Emmanuel
Depalle Philippe
Gribonval Rémi
Mallat Stéphane
Rodet Xavier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/06/1996
Field of study

International audienceSound recordings include transients and sustained parts. Their analysis with a basis expansion is not rich enough to represent efficiently all such components. Pursuit algorithms choose the decomposition vectors depending upon the signal properties. The dictionary among which these vectors are selected is much larger than a basis. Matching pursuit is fast to compute, but can provide coarse representations. Basis pursuit gives a better representation but is very expensive in terms of calculation time. This paper develops a high resolution matching pursuit: it is a fast, high time-resolution, time-frequency analysis algorithm, that makes it likely to be used far musical application

HAL-Polytechnique

IrcamCorpusTools: an Extensible Platform for Spoken Corpora Exploitation

Author: Beller Gregory
Rodet Xavier
Veaux Christophe
Publication venue
Publication date: 01/01/2008
Field of study

Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis [1] to statistical modeling of speech phenomena like prosody or expressivity [2]. In all cases, these usages require a wide range of tools for corpus creation, labeling

CiteSeerX

Edinburgh Research Explorer

Phase Minimization for Glottal Model Estimation

Author: Axel Roebel
Gilles Degottex
Xavier Rodet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref